[GPU] Extend CuVSResourcesManager #137588

ldematte · 2025-11-04T16:49:10Z

CuVSResourcesManager has the purpose of controlling access to resources to ensure a correct level of parallelism (allowing more than 1 GPU thread, but having a reasonable upper bound) and controlling the amount of GPU memory needed to prevent CUDA out-of-memory errors.

This PR extends the memory control part by introducing different strategies for memory accounting ("real", based on API calls to the device, and "tracking", which remembers the amount of memory requested during acquisition) and different estimations based on the CAGRA graph build algorithm.

The former will allow us to use pooled memory (where the amount of available memory will be different from the free device memory), the latter to use the IVFPQ CAGRA graph build algorithm for larger datasets.

elasticsearchmachine · 2025-11-04T16:49:36Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

ChrisHegarty

LGTM

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/TrackingGPUMemoryService.java

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/CuVSResourceManager.java

…e-manager

elasticsearchmachine · 2025-11-05T13:23:59Z

💚 Backport successful

Status	Branch	Result
✅	9.2

CuVSResourcesManager has the purpose of controlling access to resources to ensure a correct level of parallelism (allowing more than 1 GPU thread, but having a reasonable upper bound) and controlling the amount of GPU memory needed to prevent CUDA out-of-memory errors. This PR extends the memory control part by introducing different strategies for memory accounting ("real", based on API calls to the device, and "tracking", which remembers the amount of memory requested during acquisition) and different estimations based on the CAGRA graph build algorithm. The former will allow us to use pooled memory (where the amount of available memory will be different from the free device memory), the latter to use the IVFPQ CAGRA graph build algorithm for larger datasets.

ldematte added 2 commits November 4, 2025 15:10

Abstract GPU memory tracking

5050166

Include Cagra Params (algo + parameters) in occupancy estimation

210cc1f

ldematte requested review from ChrisHegarty and mayya-sharipova November 4, 2025 16:49

ldematte added >non-issue auto-backport Automatically create backport pull requests when merged :Search Relevance/Vectors Vector search test-gpu Run tests using a GPU v9.3.0 branch:9.2 labels Nov 4, 2025

elasticsearchmachine added the v9.2.1 label Nov 4, 2025

elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed branch:9.2 labels Nov 4, 2025

ChrisHegarty approved these changes Nov 4, 2025

View reviewed changes

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/TrackingGPUMemoryService.java Show resolved Hide resolved

mayya-sharipova reviewed Nov 4, 2025

View reviewed changes

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/CuVSResourceManager.java Show resolved Hide resolved

mayya-sharipova reviewed Nov 4, 2025

View reviewed changes

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/CuVSResourceManager.java Show resolved Hide resolved

ldematte added 2 commits November 5, 2025 11:34

Merge remote-tracking branch 'upstream/main' into gpu/improve-resourc…

8416562

…e-manager

PR comments: added javadoc, fixed unlock issue (+ test for it)

97d465e

ldematte removed the test-gpu Run tests using a GPU label Nov 5, 2025

ldematte enabled auto-merge (squash) November 5, 2025 11:09

ldematte merged commit 136677b into elastic:main Nov 5, 2025
34 checks passed

ldematte mentioned this pull request Nov 5, 2025

[9.2] [GPU] Extend CuVSResourcesManager (#137588) #137621

Merged

ldematte deleted the gpu/improve-resource-manager branch November 5, 2025 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPU] Extend CuVSResourcesManager #137588

[GPU] Extend CuVSResourcesManager #137588

Uh oh!

ldematte commented Nov 4, 2025

Uh oh!

elasticsearchmachine commented Nov 4, 2025

Uh oh!

ChrisHegarty left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elasticsearchmachine commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[GPU] Extend CuVSResourcesManager #137588

[GPU] Extend CuVSResourcesManager #137588

Uh oh!

Conversation

ldematte commented Nov 4, 2025

Uh oh!

elasticsearchmachine commented Nov 4, 2025

Uh oh!

ChrisHegarty left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elasticsearchmachine commented Nov 5, 2025

💚 Backport successful

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants